在多人2D姿势估计中,自下而上的方法同时预测了所有人的姿势,与自上而下的方法不同,不依赖于人类的检测。但是,与现有的自上而下方法相比,SOTA自下而上的方法的精度仍然不如较低。这是由于预测的人类姿势是根据不一致的人类边界箱中心进行回归的,并且缺乏人类规范的正常化,从而导致预测的人类姿势被遗漏了不准确和小规模的人。为了推动自下而上的姿势估计的信封,我们首先提出了多尺度训练,以增强网络以通过单尺度测试来处理规模变化,尤其是对于小规模的人。其次,我们介绍了双解剖中心(即头部和身体),在这里我们可以更准确,可靠地预测人类的姿势,尤其是对于小规模的人。此外,现有的自下而上方法采用多尺度测试来以多个额外的前向通行证的价格提高姿势估计的准确性,这削弱了自下而上方法的效率,与自上而下的方法相比,核心强度。相比之下,我们的多尺度训练使该模型能够预测单个前向通行证(即单尺度测试)中的高质量姿势。我们的方法在边界框的精度方面取得了38.4 \%的改进,在边界框上进行了39.1 \%的改进,以对可可的具有挑战性的小规模人群进行对现状(SOTA)的回忆(SOTA)。对于人类姿势AP评估,我们在带有单尺度测试的可可测试-DEV集中实现了新的SOTA(71.0 AP)。我们还在跨数据库评估中在Ochuman数据集上实现了最高的性能(40.3 AP)。
translated by 谷歌翻译
夜间图像不仅遭受弱光,而且遭受光线分布不均匀的影响。大多数现有的夜间可见性增强方法主要集中在增强弱光区域。这不可避免地会导致明亮区域的过度增强和饱和度,例如受光效应(眩光,泛光灯等)影响的区域。为了解决这个问题,我们需要抑制明亮区域的光效应,同时促进黑暗区域的强度。考虑到这个想法,我们引入了一种无监督的方法,该方法集成了层分解网络和光效应抑制网络。给定单夜图像作为输入,我们的分解网络学会了分解阴影,反射率和光效应层,并在无监督的特定层特定的先前损失的指导下。我们的光效应抑制网络进一步抑制了光效应,同时增强了黑暗区域的照明。该光效应抑制网络利用了估计的光效应层,作为专注于光效应区域的指导。为了恢复背景细节并减少幻觉/人工制品,我们提出了结构和高频一致性损失。我们对真实图像的定量和定性评估表明,我们的方法在抑制夜光效应和提高黑暗区域的强度方面优于最先进的方法。
translated by 谷歌翻译
从单个图像中删除阴影通常仍然是一个开放的问题。大多数现有的基于学习的方法都使用监督的学习,并需要大量的配对图像(阴影和相应的非阴影图像)进行培训。最近的无监督方法,面具 - 饰面方法解决了这一限制。但是,它需要二进制掩码来表示阴影区域,从而使其不适合柔软的阴影。为了解决这个问题,在本文中,我们提出了一个无监督的域分类器引导删除网络DC-Shadownet。具体而言,我们建议将无阴影/无阴影域分类器集成到发电机及其歧视器中,从而使它们能够专注于阴影区域。为了训练我们的网络,我们引入了基于基于物理的无阴影色彩,阴影的感知特征和边界平滑度的新颖损失。此外,我们表明我们的无监督网络可用于测试时间培训,以进一步改善结果。我们的实验表明,所有这些新型组件允许我们的方法处理柔和的阴影,并且比现有的最新阴影去除方法在定量和定性上都能在硬阴影上表现更好。
translated by 谷歌翻译
我们提出了一种雷达惯性内径测量的方法,其使用连续时间框架来熔断来自多个汽车雷达的熔丝测量和惯性测量单元(IMU)。不利的天气条件对雷达传感器的操作性能不同,与相机和激光器传感器不同,对雷达传感器的操作性能没有显着影响。雷达在这种情况下的鲁棒性和乘客车辆雷达的普遍普遍激励我们来看看雷达用于自我运动估计。连续时间轨迹表示不仅应用于实现异构和异步多传感器融合的框架,还应用于通过能够计算封闭形式的姿势及其衍生物来实现高效优化,并且在任何特定时间沿着弹道。我们将我们的连续时间估计与来自离散时间雷达 - 惯性内径型方法的方法进行比较,并表明我们的连续时间方法优于离散时间方法。据我们所知,这是第一次将连续时间框架应用于雷达惯性内径术。
translated by 谷歌翻译
Although many studies have successfully applied transfer learning to medical image segmentation, very few of them have investigated the selection strategy when multiple source tasks are available for transfer. In this paper, we propose a prior knowledge guided and transferability based framework to select the best source tasks among a collection of brain image segmentation tasks, to improve the transfer learning performance on the given target task. The framework consists of modality analysis, RoI (region of interest) analysis, and transferability estimation, such that the source task selection can be refined step by step. Specifically, we adapt the state-of-the-art analytical transferability estimation metrics to medical image segmentation tasks and further show that their performance can be significantly boosted by filtering candidate source tasks based on modality and RoI characteristics. Our experiments on brain matter, brain tumor, and white matter hyperintensities segmentation datasets reveal that transferring from different tasks under the same modality is often more successful than transferring from the same task under different modalities. Furthermore, within the same modality, transferring from the source task that has stronger RoI shape similarity with the target task can significantly improve the final transfer performance. And such similarity can be captured using the Structural Similarity index in the label space.
translated by 谷歌翻译
Modern deep neural networks have achieved superhuman performance in tasks from image classification to game play. Surprisingly, these various complex systems with massive amounts of parameters exhibit the same remarkable structural properties in their last-layer features and classifiers across canonical datasets. This phenomenon is known as "Neural Collapse," and it was discovered empirically by Papyan et al. \cite{Papyan20}. Recent papers have theoretically shown the global solutions to the training network problem under a simplified "unconstrained feature model" exhibiting this phenomenon. We take a step further and prove the Neural Collapse occurrence for deep linear network for the popular mean squared error (MSE) and cross entropy (CE) loss. Furthermore, we extend our research to imbalanced data for MSE loss and present the first geometric analysis for Neural Collapse under this setting.
translated by 谷歌翻译
In this paper we derive a PAC-Bayesian-Like error bound for a class of stochastic dynamical systems with inputs, namely, for linear time-invariant stochastic state-space models (stochastic LTI systems for short). This class of systems is widely used in control engineering and econometrics, in particular, they represent a special case of recurrent neural networks. In this paper we 1) formalize the learning problem for stochastic LTI systems with inputs, 2) derive a PAC-Bayesian-Like error bound for such systems, 3) discuss various consequences of this error bound.
translated by 谷歌翻译
Denoising Diffusion Probabilistic Models (DDPMs) are emerging in text-to-speech (TTS) synthesis because of their strong capability of generating high-fidelity samples. However, their iterative refinement process in high-dimensional data space results in slow inference speed, which restricts their application in real-time systems. Previous works have explored speeding up by minimizing the number of inference steps but at the cost of sample quality. In this work, to improve the inference speed for DDPM-based TTS model while achieving high sample quality, we propose ResGrad, a lightweight diffusion model which learns to refine the output spectrogram of an existing TTS model (e.g., FastSpeech 2) by predicting the residual between the model output and the corresponding ground-truth speech. ResGrad has several advantages: 1) Compare with other acceleration methods for DDPM which need to synthesize speech from scratch, ResGrad reduces the complexity of task by changing the generation target from ground-truth mel-spectrogram to the residual, resulting into a more lightweight model and thus a smaller real-time factor. 2) ResGrad is employed in the inference process of the existing TTS model in a plug-and-play way, without re-training this model. We verify ResGrad on the single-speaker dataset LJSpeech and two more challenging datasets with multiple speakers (LibriTTS) and high sampling rate (VCTK). Experimental results show that in comparison with other speed-up methods of DDPMs: 1) ResGrad achieves better sample quality with the same inference speed measured by real-time factor; 2) with similar speech quality, ResGrad synthesizes speech faster than baseline methods by more than 10 times. Audio samples are available at https://resgrad1.github.io/.
translated by 谷歌翻译
Deep learning has been widely used for protein engineering. However, it is limited by the lack of sufficient experimental data to train an accurate model for predicting the functional fitness of high-order mutants. Here, we develop SESNet, a supervised deep-learning model to predict the fitness for protein mutants by leveraging both sequence and structure information, and exploiting attention mechanism. Our model integrates local evolutionary context from homologous sequences, the global evolutionary context encoding rich semantic from the universal protein sequence space and the structure information accounting for the microenvironment around each residue in a protein. We show that SESNet outperforms state-of-the-art models for predicting the sequence-function relationship on 26 deep mutational scanning datasets. More importantly, we propose a data augmentation strategy by leveraging the data from unsupervised models to pre-train our model. After that, our model can achieve strikingly high accuracy in prediction of the fitness of protein mutants, especially for the higher order variants (> 4 mutation sites), when finetuned by using only a small number of experimental mutation data (<50). The strategy proposed is of great practical value as the required experimental effort, i.e., producing a few tens of experimental mutation data on a given protein, is generally affordable by an ordinary biochemical group and can be applied on almost any protein.
translated by 谷歌翻译
Deep neural networks (DNNs) are found to be vulnerable to adversarial attacks, and various methods have been proposed for the defense. Among these methods, adversarial training has been drawing increasing attention because of its simplicity and effectiveness. However, the performance of the adversarial training is greatly limited by the architectures of target DNNs, which often makes the resulting DNNs with poor accuracy and unsatisfactory robustness. To address this problem, we propose DSARA to automatically search for the neural architectures that are accurate and robust after adversarial training. In particular, we design a novel cell-based search space specially for adversarial training, which improves the accuracy and the robustness upper bound of the searched architectures by carefully designing the placement of the cells and the proportional relationship of the filter numbers. Then we propose a two-stage search strategy to search for both accurate and robust neural architectures. At the first stage, the architecture parameters are optimized to minimize the adversarial loss, which makes full use of the effectiveness of the adversarial training in enhancing the robustness. At the second stage, the architecture parameters are optimized to minimize both the natural loss and the adversarial loss utilizing the proposed multi-objective adversarial training method, so that the searched neural architectures are both accurate and robust. We evaluate the proposed algorithm under natural data and various adversarial attacks, which reveals the superiority of the proposed method in terms of both accurate and robust architectures. We also conclude that accurate and robust neural architectures tend to deploy very different structures near the input and the output, which has great practical significance on both hand-crafting and automatically designing of accurate and robust neural architectures.
translated by 谷歌翻译